coupled-tagger-r13
Zhenghua Li
http://hlt.suda.edu.cn/~zhli/index.htm
2015.5.17

This package includes the codes, data, and other materials in 
``Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen. 2015. Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study. ACL.''
You can cite this paper if these resources help your work. 
The codes are tested on both Windows and Linux.

--------------

Package overview:
-- newly-annotated-pd-sentences-with-ctb-tags: now include 1000 sentences from People Daily (PD)
-- mapping-function: files encodes the four mapping functions in our paper.
-- run-experiments-acl15-relaxed: run.sh and configuration file in our paper.
-- sample-data
-- codes

--------------

Run our codes:

Compilation:
$ cd codes; make
should suffice, and the exectable file is named as ``tagger-r''.

Train & test:
$ ./tagger-r config.txt --train=1 --test=0 --dictionary-exist=0 --thread-num=10 --iter-num=500 > create-model.log 2>&1 
$ ./tagger-r config.txt --train=1 --test=0 --dictionary-exist=1 --thread-num=10 --iter-num=500 > train.log 2>&1 
$ ./tagger-r config.txt --train=0 --test=1 --param-num-for-eval=5 --thread-num=10 > test.log 2>&1

``config.txt'' is the configuration file.
Extra options can also be provided from the command line in the form "--name=value", which will overwrite the options from the configuration file.
Major options are explained in README-configuration-options.txt

You can look at run-experiments-acl15-relaxed/run.sh.

--------------

Several good features of the codes:
-- Efficient 2d/3d/4d matrix implementation
-- multi-thread implementation
-- feasible feature design: you can directly modify FGen::addPOSFeature_unigram() and 
		FGen::addPOSFeature_bigram for complex features which are hard to represent with feature templates (not like CRF++/CRFsuit/Maxent). 
-- A very good and general framwork for implementing ML algorithms, (IO, feature extraction, feature indexing, feature weights, training, decoding, evaluation, ...)

I strongly suggest new NLPers read the codes. I believe a lot fundamentals can be learned.

--------------

Many codes are borrowed from other resources:

-- Our codes were based on the EGSTRA package by Xavier Carreras and Terry Koo and Mihai Surdeanu. I have learned a lot from their codes.
-- The implementation of multi-array is borrowed from "Numerical Recipes in C++" by Willam H. Press et al.
-- The codes for multi-trhead control is from ThreadPool by Stephen Liu <stephen.nil@gmail.com> 
-- LBFGS/L2SGD optimization in CRF++
-- UTF8 handling in ZPar

--------------

LISCENCE

This package is fully free for non-commercial usage. 
You can use, modify, or redistribute this package for non-commercial purposes.
However, for the codes from other resources as listed above, we strictly comply with and follow the corresponding liscence statements.
We will be grateful if you can properly refers to our work.
For the purpose of commercial usage, please contact zhenghualiir@gmail.com.








